technical quality
Evaluating Speech-to-Text x LLM x Text-to-Speech Combinations for AI Interview Systems
Allbert, Rumi, Yazdani, Nima, Ansari, Ali, Mahajan, Aruj, Afsharrad, Amirhossein, Mousavi, Seyed Shahabeddin
Voice-based conversational AI systems increasingly rely on cascaded architectures that combine speech-to-text (STT), large language models (LLMs), and text-to-speech (TTS) components. We present a large-scale empirical comparison of STT x LLM x TTS stacks using data sampled from over 300,000 AI-conducted job interviews. We used an LLM-as-a-Judge automated evaluation framework to assess conversational quality, technical accuracy, and skill assessment capabilities. Our analysis of five production configurations reveals that a stack combining Google's STT, GPT-4.1, and Cartesia's TTS outperforms alternatives in both objective quality metrics and user satisfaction scores. Surprisingly, we find that objective quality metrics correlate weakly with user satisfaction scores, suggesting that user experience in voice-based AI systems depends on factors beyond technical performance. Our findings provide practical guidance for selecting components in multimodal conversations and contribute a validated evaluation methodology for human-AI interactions.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Patients Speak, AI Listens: LLM-based Analysis of Online Reviews Uncovers Key Drivers for Urgent Care Satisfaction
Xu, Xiaoran, Xue, Zhaoqian, Zhang, Chi, Medri, Jhonatan, Xiong, Junjie, Zhou, Jiayan, Jin, Jin, Zhang, Yongfeng, Ma, Siyuan, Li, Lingyao
Investigating the public experience of urgent care facilities is essential for promoting community healthcare development. Traditional survey methods often fall short due to limited scope, time, and spatial coverage. Crowdsourcing through online reviews or social media offers a valuable approach to gaining such insights. With recent advancements in large language models (LLMs), extracting nuanced perceptions from reviews has become feasible. This study collects Google Maps reviews across the DMV and Florida areas and conducts prompt engineering with the GPT model to analyze the aspect-based sentiment of urgent care. We first analyze the geospatial patterns of various aspects, including interpersonal factors, operational efficiency, technical quality, finances, and facilities. Next, we determine Census Block Group(CBG)-level characteristics underpinning differences in public perception, including population density, median income, GINI Index, rent-to-income ratio, household below poverty rate, no insurance rate, and unemployment rate. Our results show that interpersonal factors and operational efficiency emerge as the strongest determinants of patient satisfaction in urgent care, while technical quality, finances, and facilities show no significant independent effects when adjusted for in multivariate models. Among socioeconomic and demographic factors, only population density demonstrates a significant but modest association with patient ratings, while the remaining factors exhibit no significant correlations. Overall, this study highlights the potential of crowdsourcing to uncover the key factors that matter to residents and provide valuable insights for stakeholders to improve public satisfaction with urgent care.
- North America > United States > Florida > Hillsborough County > Tampa (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Virginia (0.04)
- (10 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Google AI can now tell which photos you'll think are beautiful
Beauty is in the eye of the beholder, or so the saying goes, and the same is often true when trying to pick out a perfect photography. Say you've got ten relatively similar shots of a loved one, family pet, or a stunning landscape – which one is the perfect shot and, crucially, why? It's a tough question to answer as there are multiple factors at play. It could be the shot which is the most competent, with no sign of any pesky blur or noise, but, on the other hand, it could also be the shot which catches the light in a way that makes it more appealing than the rest, even if it isn't technically the best of the bunch. Even if we're not aware of it, the human brain tends to strike a balance between technical quality and aesthetic preference when judging photos.
AAAI-13 Preface
Jardins, Marie des (University of Maryland Baltimore County) | Littman, Michael (Rutgers University)
Welcome to the Twenty-Seventh AAAI Conference on Artificial Intelligence, AAAI-13! As can be seen in these proceedings, AI's scope and influence continue to grow. This year, we received 827 submissions across a variety of tracks, allowing us to put together a diverse and exciting technical program featuring the field's top research. The AAAI-13 program seeks to capture the diversity of this important field. The main technical program features four special tracks -- AI and the Web, Cognitive Systems, Computational Sustainability, and AI and Robotics -- which highlight specialized areas of the field.
- North America > United States > California (0.15)
- North America > United States > Texas > Travis County > Austin (0.05)
- North America > United States > Pennsylvania (0.05)
- North America > United States > Colorado (0.05)